What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator

نویسندگان

چکیده

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such extension enables PeVFA preserve values of multiple policies at same time and brings appealing characteristic, i.e., generalization among policies. formally analyze under Generalized Policy Iteration (GPI). From theoretical empirical lens, we show that generalized estimates offered by may have lower initial approximation error true successive policies, is expected improve consecutive during GPI. Based on above clues, introduce a new form GPI with leverages along improvement path. Moreover, propose representation learning framework for RL policy, providing several approaches learn effective embeddings from network parameters or state-action pairs. In our experiments, evaluate efficacy OpenAI Gym continuous control tasks. For representative instance algorithm implementation, Proximal Optimization (PPO) re-implemented paradigm achieves about 40% performance its vanilla counterpart most environments.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Value Function Approximation and Policy Performance

Fig. 1 gives a geometric interpretation of value function approximation. We may think of J � as a vector in ∗; by considering approximations of the form J̃ = �r, we restrict attention to the hyperplane J = �r in the same space. Given a norm ≤ · ≤ (e.g., the Euclidean norm), an ideal value function approximation algorithm would choose r minimizing ≤J −�r≤; in other words, it would find the projec...

متن کامل

Value-Oriented Policy Taking and Contextual Architecture in Historical Context

Today, the field-oriented architecture of the historical context of the most important topics in the field of architecture Because of the inherent value and practical concepts and is directly related to the knowledge, awareness and decision-making at the individual or individuals. No matter how much knowledge is more valuable than the deeper, more complete and more accurate will be extracted ...

متن کامل

origins of armenia’s foreign policy and its foreign policy towards iran

foreign policy takes root from complicated matters. however, this issue may be more truth about armenia. although the new government of armenia is less than 20 years, people of this territory are the first ones who officially accepted christianity. in very past times, these people were a part of great emperors like iran, rome, and byzantium.armenia is regarded as a nation with a privileged hist...

15 صفحه اول

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contrib...

متن کامل

Characterizing a Brain-Based Value-Function Approximator

The field of Reinforcement Learning (RL) in machine learning relates significantly to the domains of classical and instrumental conditioning in psychology, which give an understanding of biology’s approach to RL. In recent years, there has been a thrust to correlate some machine learning RL algorithms with brain structure and function, a benefit to both fields. Our focus has been on one such st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i8.20820